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ABSTRACT 

Complex networks of real-world systems are believed to be 
controlled by common phenomena, producing structures far 
from regular or random. These include scale- free degree dis- 
tributions, small- world structure and assort at ive mixing by 
degree, which are also the properties captured by different 
random graph models proposed in the literature. However, 
many (non-social) real-world networks are in fact disassor- 
t at ive by degree. Thus, we here propose a simple evolving 
model that generates networks with most common proper- 
ties of real-world networks including degree disassortativity. 
Furthermore, the model has a natural interpretation for ci- 
tation networks with different practical applications. 

Categories and Subject Descriptors 

L6.4 [Computing Methodologies]: Simulation and Mod- 
eling — Model validation and analysis; D.2 [Data Struc- 
tures]: Graphs and networks 
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1. INTRODUCTION 

Networks are the simplest representation of complex sys- 
tems of interacting parts. Examples of these are ubiquitous 
in practice, including large social networks 5 , information 
systems and cooperate ownerships to name just a 
few. Despite a seemingly plain form, real- world networks re- 
veal characteristic structural properties that are absent from 
regular or random systems [23] [5]. Thus, networked systems 
are believed to be controlled by common phenomena. 

Scale- free degree distributions |2 , small- world phenom- 
ena [23], degree mixing 14 (i.e., degree correlations at links' 
ends) and existence of communities 6 (i.e., densely linked 
groups of nodes) are perhaps among most widely analyzed 
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properties of large real- world networks. Note that commu- 
nity structure implies assortative (i.e, positively correlated) 
mixing by degree ^16j, which can be seen as a tendency of 
hubs (i.e., highly linked nodes) to cluster together. The 
above are also the properties captured by many random 
graph models proposed in the literature [9l |13| [24] . 

However, most (non-social) networks deviate from this fig- 
ure. Biological and technological networks are in fact degree 
disassortative (i.e., negatively correlated), while different in- 
formation networks often reveal no clear degree mixing '14| 
^ (see Figure[T]). Thus, we here propose an evolving random 
graph model based on the link copying mechanism 9 . Each 
newly added node explores the network using the burning 
process in [Tl], while links of the visited nodes are copied 
independently of the latter. The model generates scale-free 
small- world networks with community structure and also de- 
gree disassortativity. Furthermore, it has a natural interpre- 
tation for citation networks. The above process imitates an 
author of a paper including references into the bibliography 
(i.e., its citation dynamics), which enables different practical 
applications in bibliometrics (see Section 3.2). 

The rest of the paper is structured as follows. Section [2] 
introduces the proposed (Citation) model, while a thorough 
analysis is given in Sectionjs] Section |4] concludes the paper. 

2. THE CITATION MODEL 

Let a network be represented by a simple graph G{N^ L), 
where N is the set of nodes, \N\ — n, and L is the set of 
links, \L\ = m. Next, let Vi be the set of neighbors of node 
i ^ N and let ki be its degree, ki = IF^I. Last, let k be the 
mean degree and /cat the mean neighbor degree. 

Proposed graph model is based on the burning process 
of Forest Fire model flT], which we introduce first. Due to 
simplicity, the model is presented for undirected networks. 




Figure 1: Data mining part of Cora citation net- 
work 1 12 1 with highlighted hubs (i.e., 1% of most 
highly linked nodes) that are scattered across the 
network. (Node sizes are proportional to degrees.) 




(a) Forest Fire model (b) Butterfly model (c) Copying model (d) Citation model (our) 



Figure 2: Schematic representation of linking dynamics of different graph models. (a) In Forest Fire 
model 1 11 1, newly added node i selects an ambassador a (blue node) uniformly at random and links to it 
(solid arrow). Next, some of its neighbors are taken as the ambassadors (e.g., y and z) and the process re- 
peats, (b) Butterfly model |13| forms links only with some flxed probability (dashed arrows), (c) In Copying 
model |9|, node i links to a and also to some of its neighbors x, z (green nodes), (d) Proposed Citation 
model forms links only with the neighbors of the ambassador a (e.g., x and y), however, i can still link to a. 



Let p be the burning probability, p G [0, |) (see below). 
Initially, the network consists of a single node, while for each 
newly added node 2, the burning process proceeds as follows. 

(1) i chooses an ambassador a ^ N uniformly at random 
(we say that i burns a) and links to it. 

(2) i randomly selects (at most) Xp neighbors of a that were 
not yet burned ai , . . . , a^p G F^ and links to them, (xp is 
sampled from a geometric distribution with mean jz^-) 

(3) ai, . . . , ttxp are taken as the ambassadors of i (step (2)). 

Since each node can be visited at most once, the burning 
process surely converges. Thus, to generate a network with 
n nodes, the model repeats the above procedure n—1 times. 

Forest Fire model produces shrinking diameters and den- 
sification phenomena observed in temporal networks llj . 
Furthermore, generated networks are scale- free and small- 
world, and reveal a pronounced community structure. How- 
ever, in contrast to many real- world networks, the model 
gives degree assortative networks (see Section [sj. 

The model also has a natural interpretation for citation 
networks. Burning process imitates an author of a paper 
including references into the bibliography (i.e., citation dy- 
namics). Author first reads a related paper, or selects the 
paper that triggered the research, and cites it (step (1)). 
Author then considers its bibliography for other related pa- 
pers (step (2)). Some of these are further considered and 
also cited, while the author continues as before (step (3)). 
Nevertheless, Forest Fire model fails to reproduce some of 
the properties of citation networks (e.g., degree mixing). 

Note that the described process assumes that authors read, 
or at least consider, all the papers they cite. However, this 
is indeed not the case Tf]. For example, seminal work on 
random graphs conducted by Erdos and Renyi |4] is per- 
haps among most widely cited papers in network science 
literature. Although, presumably, only a smaller number 
of authors have actually read the original paper. As the 
work is widely discussed elsewhere, most authors have just 
copied the reference from another paper. On the other hand, 
authors also do not cite all the papers they read, though 
related to their work. This can be simply due to space lim- 
itations. Nevertheless, a paper can still be read thoroughly, 
with many of its references further considered and cited. 

Examples suggest that the papers that authors read or cite 
are selected due to two, not necessarily dependent, processes. 
We thus propose a Citation model that adopts the above 
burning procedure to traverse the network, while the links 
are formed according to another independent process. 



Let q be the linking probability, q G [0,1) (see below). 
Initially, the network consists of a single link, while for each 
newly added node i, the model proceeds as follows. 

(1) i chooses an ambassador a ^ N uniformly at random. 

(2) i randomly selects (at most) Xp neighbors of a that were 
not yet burned 

(3) i randomly selects (at most) Xq neighbors of a that were 
not yet linked ji, . . . ,jxq G Fa and links to them. 

(4) ai, . . . , ttxp are taken as the ambassadors of i (step (2)). 

Details are the same as before. Again, the process surely 
converges, while the entire procedure is repeated n — 2 times. 

Let s be the mean number of burned nodes (i.e., ambas- 
sadors). A node selects ambassadors on each step, thus, 

A node will fail to form any link (i.e., become isolated) 
with probability (1 — ^)*. Although isolated nodes are a com- 
mon property of real- world networks, they are often ignored 
in practice or the network is even reduced to the largest con- 
nected component. Thus, for the analysis here, we repeat 
the procedure until the largest component has n nodes. 

Since a node forms links on each step, expected net- 
work degree is (with 1 — (1 — qY correction for isolated nodes) 




Burning probability p Linking probability q 

(a) Ambassadors s (b) Network degree k 



Figure 3: Analysis of Citation model at different p 
and q — 0.75 (left), and p — 0.3 and different q (right). 
Solid lines show theoretical bounds in equations (IT]) 
and ([2]). (Results are estimates of the mean over 
100 network realizations with different n. Shaded 
regions correspond to likely parameter values [lOj .) 
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Figure 4: Comparison 
(Results are estimates 



of graph models at different p and q — 0.75 (top), and p — 0.3 and different q (bottom), 
of the mean over 100 network realizations with n = 1000. See also caption of Figure Isl) 



Although equations ([T]) and ([2| are only valid in the limit 
of large network size, the bounds are rather tight for large 
enough n (see Figure |3|. Thus, given network degree k and 
fixed q, one can solve the system for p, which c an b e used 
for parameter estimation in practice (see Section 3.2). 

Citation model generates small- world networks with scale- 
free degree distribution and community structure (see Sec- 
tion 3.1). Furthermore, in contrast to Forest Fire model, 
resulting networks are degree disassortative. We stress that 
the key factor here is that newly added nodes do not (nec- 
essarily) link to their ambassadors, which in fact produces 
degree assort at ivity. Since a node copies the links of its 
ambassadors, linking to them obviously promotes assorta- 
t ivity. However, in the absence of an explicit process intro- 
ducing assortativity, (scale-free) networks are expected to 
be degree disassortative 8 . The analysis in Section [s] thus 
also includes a variant of Forest Fire model denoted Butter- 
fly model, where a node links to its ambassadors only with 
probability q (considered in [13]), as well as a variant of the 
proposed Citation model denoted Copying model, where a 
node links to each ambassador [9] (for details see Figure [gl . 

Other authors have proposed models very similar to ours |21[ 
[9] [llj [13] [24]. Nevertheless, these either do not adopt the 
burning process to traverse the network or the model nec- 
essarily links the nodes to their ambassadors, which results 
in degree assortativity. More precisely, the set of the linked 
nodes is always a subset of the nodes burned (or vice versa). 
However, in the case of Citation model, these two sets can 
intersect arbitrarily, while they can also be disjoint. 

3. EXPERIMENTAL ANALYSIS 



Section |3rT] conducts an empirical analysis of Citation model 
and several alternatives proposed in the literature (see Sec- 
tion |2|. Next, networks constructed with different models 
are compared against a larger citation network (Section [3.2[ ). 

3.1 Analysis of the model 

Figure [4] shows basic statistics of the networks generated 
with different graph models for parameters p and q shown 
(see Section [2]). Most notably, only the proposed Citation 
model gives degree disassortative networks measured by the 



mixing coefficient r G [—1,1] [14 (see Figures 4(b) and 4(g) ). 
r is simply a Pearson correlation coefficient of degrees at 
links' ends. Thus, r ^ for Citation model, while r ^ 
for Forest Fire and Butterfly models. Observe that Copying 
model also generates networks with r < for very large 
p and q, however, these are mu ch de nser than comparable 
real-world networks (see Figures [4(iO] and [4(f)]). 

On the other hand, all models give small- world networks 
with short mean distance between the nodes I [l] (see Fig- 



ures 4(c) and |4(h)[ ) and high transitivity me asured by the 
clustering coefficient C G [0, 1] [23l (see Figures [4(d)] and[4^ 



Note that C increases with p, while q has little effect on C. 
Furthermore, all models generate networks with clear com- 
munity structure according to modularity Q G [0, 1] [Ts], 
where Q is estimated using a fast multi-stage optimization [31 
(see Figures 4(e) and 4(j)). Although Q decreases with in- 
creasing p or ^ in the case of Citation and Copying mod- 
els, the values are somewhat comparable to those observed 
in real-world networks. Forest Fire and Butterfly models, 
however, appear to overestimate Q for selected p and q. 

Networks constructed with Citation model also reveal scale- 
free degree distributions [2] (see Figure 5(a) ), thus, the model 
generates most common properties of real- world networks. 

3.2 Cora citation network 

Due to a natural interpretation for citation networks (see 
Section [5]) , the proposed model has different practical ap- 
plications in bibliometrics. We here analyze author citation 
dynamics based on the famous Cora dataset [12] that con- 
tains computer science papers collected from the web, and 
also the references automatically parsed from the bibliogra- 



Table 1: Comparison of Cora citation network and 
those constructed with different graph models for p 
and q shown. (Results are estimates of the mean 
over 100 network realizations with n — 23166.) 
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0.369 
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7.760 


-0.047 


Cora 




89157 


7.697 


-0.055 



phies of the papers. We extract a citation network with 
n = 23166, while other statistics are reported in Table [l] 

Table^also includes the networks generated with Citation 
and Forest Fire models, where parameters p and q were es- 
timated as described in Section [2] Note that Citation model 
well matches the disassortative mixing regime in Cora cita- 
tion network (observe also a similar trend in Figure 5(b)), 
while Forest Fire model gives degree assort at ive networks. 
(For comparison based on other network properties see 19 .) 

Recall that s in equation ([T]) can be seen as the number 
references actually read by an author of some paper. Thus, 
the fraction of papers considered by the authors, relative to 
the number of all papers cited, can be estimated to 2s /k = 
0.66. The value is much larger than expected ^17j, however, 
the results are largely influenced by an automatic sampling 
procedure [12] (i.e., on average, only k/2 — 3.85 references 
of each paper are also included in the network). 

4. CONCLUSION 

The paper proposes a simple graph model that generates 
networks with most common properties of real-world net- 
works and, in contrast to many other models, dissasortative 
degree mixing. The model also has a natural interpretation 
for citation networks with different practical applications. 

Due to simplicity, the analysis in the paper is based on 
undirected networks. However, this presents a serious limi- 
tation, especially for citation networks considered here. Fu- 
ture work will extend the analysis to directed and also other 
types of networks, while more reliable datasets will be used 
for the analysis of author citation dynamics (based on DBLP 
and WoS data). Furthermore, the model will be rigorously 
compared against others with similar characteristics [20 . 
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